Library and functions import

library(SpaceTrooper)

Setting directory: path to where all needed DBKero input files are stored. You can download all of them at: https://kero.hgc.jp/cgi-bin/download/Breast_Cancer_data/CosMX_data_Case2.zip - Run5810_Case2_exprMat_file.csv - count matrix - Run5810_Case2_fov_positions_file.csv - fov position file - Run5810_Case2_metadata_file.csv - cell metadata - Run5810_Case2-polygons.csv - cell polygons

dirname <- "/Users/benedettabanzi/Documents/DBKero/CosMX_data_Case2"
sample_name <- "CosMx_DBKero_BC"

Data import and SpatialExperiment object creation

SpatialExperiment object creation

The structure is very similar to SeuratObject. It’s an R/Bioconductor S4 class with nested slots.

spe <- readCosmxSPE(dirname = dirname, sample_name=sample_name, fov_dims=c(xdim=4256, ydim=4256))
spe
## class: SpatialExperiment 
## dim: 1010 59284 
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(1010): RAMP2 CD83 ... NegPrb09 NegPrb10
## rowData names(0):
## colnames(59284): f1_c1 f1_c2 ... f45_c1993 f45_c1994
## colData names(20): fov cellID ... Max.DAPI sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id

To learn more

SpatialExperiment (SPE) is provided with some matrix-like methods such as:

dim(spe)
## [1]  1010 59284

As you can notice, the structure of the count matrix inside the SPE object is genes x cells. Unique cell identifiers are constructed by readCosmxSPE as f(ov) number followed by c(ell) number.

colnames(spe)[1:3]
## [1] "f1_c1" "f1_c2" "f1_c3"

It is possible to subset using square brackets synchronizing across all attributes.

sub_spe <- spe[, spe$fov == 11]

SPE comes with a 1st hierarchy of slots as follows. To perform SpaceTrooper preprocessing, we focus only on 3 slots: colData, assays and metadata.

str(spe, max.level = 2)
## Formal class 'SpatialExperiment' [package "SpatialExperiment"] with 9 slots
##   ..@ int_elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   ..@ int_colData        :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   ..@ int_metadata       :List of 3
##   ..@ rowRanges          :Formal class 'CompressedGRangesList' [package "GenomicRanges"] with 5 slots
##   ..@ colData            :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   ..@ assays             :Formal class 'SimpleAssays' [package "SummarizedExperiment"] with 1 slot
##   ..@ NAMES              : NULL
##   ..@ elementMetadata    :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   ..@ metadata           :List of 4

Let’s start with “assays” slot: it contains the experiment assay, just like SeuratObject.

Tip: the count matrix here isn’t a sparse matrix of dgCMatrix class but a normal matrix.

str(spe@assays)
## Formal class 'SimpleAssays' [package "SummarizedExperiment"] with 1 slot
##   ..@ data:Formal class 'SimpleList' [package "S4Vectors"] with 4 slots
##   .. .. ..@ listData       :List of 1
##   .. .. .. ..$ counts: int [1:1010, 1:59284] 0 0 0 0 0 0 0 0 0 0 ...
##   .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. ..$ : chr [1:1010] "RAMP2" "CD83" "RYK" "CD5L" ...
##   .. .. .. .. .. ..$ : chr [1:59284] "f1_c1" "f1_c2" "f1_c3" "f1_c4" ...
##   .. .. ..@ elementType    : chr "ANY"
##   .. .. ..@ elementMetadata: NULL
##   .. .. ..@ metadata       : list()

In order to access directly to the count matrix, you must enter 2 levels of slots before, which are lists. This is recurrent in SingleCellExperiment-based data structures, as they tend to organize data into lists. However, they provide accessors to directly access these data with analogous results, just as below:

str(spe@assays@data@listData$counts)
##  int [1:1010, 1:59284] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:1010] "RAMP2" "CD83" "RYK" "CD5L" ...
##   ..$ : chr [1:59284] "f1_c1" "f1_c2" "f1_c3" "f1_c4" ...
#counts(spe) is the accessor

colData slot: it contains cell metadata. If during the creation of the SPE we would have had genes’ metadata they would be stored in a similar slot, called rowData. Again, metadata contained in CosMx metadata input file are inserted into a list.

Tip: sometimes it is useful to export the whole cell metadata and convert them in dataframe to perform different tasks, such as plots with ggplot.

str(spe@colData@listData) #list, needs to be converted in df outside of SPE
## List of 20
##  $ fov                   : int [1:59284] 1 1 1 1 1 1 1 1 1 1 ...
##  $ cellID                : int [1:59284] 1 2 3 4 5 6 7 8 9 10 ...
##  $ cell_id               : chr [1:59284] "f1_c1" "f1_c2" "f1_c3" "f1_c4" ...
##  $ Area                  : int [1:59284] 2732 5164 5405 11342 6800 7768 17039 8481 15033 6254 ...
##  $ AspectRatio           : num [1:59284] 2.31 2.78 1.82 0.97 1.46 1.15 1.42 0.77 0.9 0.89 ...
##  $ CenterX_local_px      : int [1:59284] 1603 1728 1874 1068 4139 3103 1115 3009 804 705 ...
##  $ CenterY_local_px      : int [1:59284] 4237 4232 4227 4146 4153 4084 3996 3969 3938 3845 ...
##  $ Width                 : int [1:59284] 90 136 109 121 115 107 183 95 141 91 ...
##  $ Height                : int [1:59284] 39 49 60 125 79 93 129 123 156 102 ...
##  $ Mean.PanCK            : int [1:59284] 126 261 194 358 320 224 398 126 360 103 ...
##  $ Max.PanCK             : int [1:59284] 531 1782 1098 806 898 1701 1126 1350 1035 1782 ...
##  $ Mean.CD68             : int [1:59284] 41 61 242 53 139 319 88 122 55 53 ...
##  $ Max.CD68              : int [1:59284] 466 459 3009 255 1265 4936 425 3641 1819 4130 ...
##  $ Mean.MembraneStain_B2M: int [1:59284] 603 881 945 72 662 1445 135 1298 65 279 ...
##  $ Max.MembraneStain_B2M : int [1:59284] 4790 9610 5773 673 5373 14173 695 14151 4425 5417 ...
##  $ Mean.CD45             : int [1:59284] 62 51 179 33 114 54 120 43 35 88 ...
##  $ Max.CD45              : int [1:59284] 677 862 5799 594 3393 5718 847 650 4784 6187 ...
##  $ Mean.DAPI             : int [1:59284] 142 115 380 98 181 346 84 33 100 303 ...
##  $ Max.DAPI              : int [1:59284] 999 712 1573 217 1019 2042 216 115 218 1423 ...
##  $ sample_id             : chr [1:59284] "CosMx_DBKero_BC" "CosMx_DBKero_BC" "CosMx_DBKero_BC" "CosMx_DBKero_BC" ...
#colData(spe) # returns the hierarchy before listData

Also in this case some accessors are available and allow you to directly access cell metadata elements.

str(colData(spe)$fov) # vector
##  int [1:59284] 1 1 1 1 1 1 1 1 1 1 ...
#spe$fov

Tip: how to remove a cell metadata variable? Of course tou can choose the obvious solution of assignining NULL to the variable in the SPE object, but here’s also the complex way to do it.

#spe$CenterX_local_px <- NULL

#coldata <- data.frame(colData(spe))
#coldata <- coldata |> select(!c("CenterX_local_px"))
#spe@colData@listData <- as.list(coldata)

metadata slot: it contains cell experiment metadata and it’s used to store further data, such as the fov position file, the fov dimensions, the path to the polygon files and the technology name. Again this slot is provided with an accessor.

str(spe@metadata)
## List of 4
##  $ fov_positions:'data.frame':   45 obs. of  3 variables:
##   ..$ fov        : int [1:45] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ x_global_px: num [1:45] 133519 133519 133519 133519 137785 ...
##   ..$ y_global_px: num [1:45] 36286 32020 27754 23488 23488 ...
##  $ fov_dim      : Named num [1:2] 4256 4256
##   ..- attr(*, "names")= chr [1:2] "xdim" "ydim"
##  $ polygons     : chr "/Users/benedettabanzi/Documents/DBKero/CosMX_data_Case2/Run5810_Case2-polygons.csv"
##  $ technology   : chr "Nanostring_CosMx"
#metadata(spe) it's the accessor

Extra: Some interesting information is stored in a further slot: int_colData. They’re not crucial for preprocessing but could be for downstream analysis. Spatial coordinates x and y of cell centroids in pixel are retrieved by cell metadata and stored here as well, they will be used by plotting functions, plus here are saved dimensionality reductions.

str(spe@int_colData@listData) 
## List of 4
##  $ reducedDims  :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   .. ..@ rownames       : NULL
##   .. ..@ nrows          : int 59284
##   .. ..@ elementType    : chr "ANY"
##   .. ..@ elementMetadata: NULL
##   .. ..@ metadata       : list()
##   .. ..@ listData       : Named list()
##  $ altExps      :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   .. ..@ rownames       : NULL
##   .. ..@ nrows          : int 59284
##   .. ..@ elementType    : chr "ANY"
##   .. ..@ elementMetadata: NULL
##   .. ..@ metadata       : list()
##   .. ..@ listData       : Named list()
##  $ colPairs     :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
##   .. ..@ rownames       : NULL
##   .. ..@ nrows          : int 59284
##   .. ..@ elementType    : chr "ANY"
##   .. ..@ elementMetadata: NULL
##   .. ..@ metadata       : list()
##   .. ..@ listData       : Named list()
##  $ spatialCoords: num [1:59284, 1:2] 135122 135247 135393 134587 137658 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:59284] "f1_c1" "f1_c2" "f1_c3" "f1_c4" ...
##   .. ..$ : chr [1:2] "CenterX_global_px" "CenterY_global_px"
#spatialCoords(spe) == spe$CenterX_global_px, spe$CenterY_global_px

Count QC and other metrics

Scater QC

spe <- spatialPerCellQC(spe, micronConvFact=0.12)
## Removing 50 cells, they have 0 counts

To learn more

Some new columns have been added by spatialPerCellQC: “sample_id” = sample name, “sum” = total counts per cell, “detected” = unique detected probes per cell, “subsets_NegPrb_sum” = control negative probe counts per cell, “subsets_NegPrb_detected” = unique detected negative probes per cell, “subsets_NegPrb_percent” = percentage of negative probe counts on total counts, “total” = analogous to sum, “control_sum” = sums all control probe counts per cell (in this case only one type of control probes is present), “control_detected” = unique control probe (considering all control probe types) detected per cell, “target_sum” = only target probe counts per cell (total - all control counts), “target_detected” = unique detected target genes per cells, “CenterX_global_px” = centroid x coordinate of the cell in pixel,
“CenterY_global_px” = centroid y coordinate of the cell in pixel, “CenterX_global_um” = centroid x coordinate of the cell in um (multiplied by micron conversion factor), “CenterY_global_um” = centroid y coordinate of the cell in um, “Area_um” = cell area converted to um using (micron conversion factor)^2, “log2AspectRatio” = log2-transformed aspect ratio which is height/width ratio, “ctrl_total_ratio” = all control probe count on total count ratio, “log2CountArea”, log2-transformed target count on cell area in um ratio.

Polygons loading

Adding custom polygons

To make things simpler… make sure to have only one polygons file inside your data directory.

polygons <- readPolygonsCosmx(polygonsFile=metadata(spe)$polygons, type="csv", keepMultiPol=TRUE, verbose=FALSE)

spe <- addPolygonsToSPE(spe, polygons)
## Warning in rownames(polygons) == colnames(spe): longer object length is not a
## multiple of shorter object length

To learn more

Polygons is an sf object stored as an element of colData list. An sf object is a dataframe + a geometry, namely a list of coordinates. Coordinates can be stored in different data structures according to the geometry type they describe (i.e. polygon, multipolygon, point ecc.) SpaceTrooper creates two lists of polygon coordinates - called global and local - the first using the global x and y coordinates and the second with local coordinates.

str(spe$polygons)
## Classes 'sf' and 'data.frame':   59234 obs. of  7 variables:
##  $ cell_id : Factor w/ 59284 levels "f1_c1","f1_c10",..: 1 112 175 186 197 208 219 230 241 2 ...
##  $ is_multi: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ multi_n : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ fov     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ cellID  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ global  :sfc_POLYGON of length 59234; first list element: List of 1
##   ..$ : num [1:22, 1:2] 135166 135166 135165 135164 135157 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  $ local   :sfc_POLYGON of length 59234; first list element: List of 1
##   ..$ : num [1:22, 1:2] 1647 1647 1646 1645 1638 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  - attr(*, "sf_column")= chr "global"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
##   ..- attr(*, "names")= chr [1:6] "cell_id" "is_multi" "multi_n" "fov" ...

Exploring the global geometry, it’s a further data structure called sfc object containing a single polygon coordinates per each cell.

spe$polygons$global
## Geometry set for 59234 features 
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 125781 ymin: -50493.18 xmax: 196996.5 ymax: 55873.57
## CRS:           NA
## First 5 geometries:
## POLYGON ((135166 40541.79, 135166 40530.79, 135...
## POLYGON ((135314 40541.79, 135314 40526.79, 135...
## POLYGON ((135445 40541.79, 135446 40540.79, 135...
## POLYGON ((134589 40493.79, 134594 40492.79, 134...
## POLYGON ((137687 40477.79, 137692 40476.79, 137...

Since the global geometry is basically a list, this is how to retrieve the first polygon:

spe$polygons$global[[1]]
## POLYGON ((135166 40541.79, 135166 40530.79, 135165 40527.79, 135164 40525.79, 135157 40518.79, 135143 40508.79, 135141 40507.79, 135132 40504.79, 135127 40503.79, 135113 40503.79, 135108 40504.79, 135105 40505.79, 135099 40508.79, 135091 40514.79, 135081 40524.79, 135080 40526.79, 135079 40529.79, 135078 40534.79, 135078 40535.79, 135079 40540.79, 135080 40541.79, 135166 40541.79))

Since each polygon is composed by multiple points each with an x and a y coordinate, each polygon is stored as a matrix. The following code lines are synonymous ways to return the first polygon in matrix format.

spe$polygons$global[[1]][[1]]
##         [,1]     [,2]
##  [1,] 135166 40541.79
##  [2,] 135166 40530.79
##  [3,] 135165 40527.79
##  [4,] 135164 40525.79
##  [5,] 135157 40518.79
##  [6,] 135143 40508.79
##  [7,] 135141 40507.79
##  [8,] 135132 40504.79
##  [9,] 135127 40503.79
## [10,] 135113 40503.79
## [11,] 135108 40504.79
## [12,] 135105 40505.79
## [13,] 135099 40508.79
## [14,] 135091 40514.79
## [15,] 135081 40524.79
## [16,] 135080 40526.79
## [17,] 135079 40529.79
## [18,] 135078 40534.79
## [19,] 135078 40535.79
## [20,] 135079 40540.79
## [21,] 135080 40541.79
## [22,] 135166 40541.79
#spe$polygons$global[[1]][1]

However, only this code line allow to run matrix methods on the first polygon, not the second line. This can be useful to run operations on polygons coordinates, for example when we compute manually aspect ratio.

spe$polygons$global[[1]][[1]][1,] # returns only the first point coordinates
## [1] 135165.99  40541.79
#spe$polygons$global[[1]][1][1,] doesn't work

Tip: in order to change the active geometry, the one that is used by default by sf functions, you can use the following command and replace global with another geometry name. The same command can be used to rename the active geometry, by replacing global with a desired name.

#st_geometry(spe$polygons) <- "global"

Global sample maps

Cell centroids in FOV grid

The following plot represents all cell centroids on global scale together with the FOV numbered grid, to know at a glance where cells are distributed among FOVs.

plotCellsFovs(spe)

Global polygons map

You can also view the polygons on a global scale.

plotPolygonsSPE(spe, bg_color = "white")
## Warning: The projection of the shape object pols is not known, while it seems to
## be projected.
## Warning: Current projection of shape pols unknown and cannot be determined.

Preliminary plots

Polygons colored by variable

It’s still the previous function but with some different parameters. It allows to color polygons with a gradient scale to represent a continuous variable such as log2AspectRatio.

plotPolygonsSPE(spe, colour_by = "log2AspectRatio", bg_color = "black", border_col=NULL) # def bg is black
## Warning: The projection of the shape object pols is not known, while it seems to
## be projected.
## Warning: Current projection of shape pols unknown and cannot be determined.
## Variable(s) "log2AspectRatio" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Feature map

It’s similar to the previous function but it plots cells’ centroids instead of polygons.

plotCentroidsSPE(spe, colour_by="total", alpha = 1)

plotCentroidsSPE(spe, colour_by="control_sum", isNegativeProbe=TRUE, alpha = 1, order_by = "control_sum")
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Metric histogram

Simple plot to view single metrics’ distributions as histograms.

plotMetricHist(spe, metric="Area_um")

First flagging

Spatial outlier detection for mean DAPI signal and area in um. In this case, we’re going to perform the outlier detected using both Medcouple (Hubert, M. and Vandervieren, E. 2008) and MAD contained in scater::isOutlier method. Two new variables will be added per each metric to colData slot: metric_outlier_mc and metric_outlier_sc. Both of them contain attributes referring to lower and higher thresholds identified by the two test to identify outliers. However if you wish to perform only one method of outlier detection, you can set method to “mc” or “scuttle”.

spe <- computeSpatialOutlier(spe, compute_by="Mean.DAPI", method="both")

spe <- computeSpatialOutlier(spe, compute_by="Area_um", method="both")

str(spe$Area_um_outlier_mc) # fences for both included
##  'outlier.filter' logi [1:59234] FALSE FALSE FALSE FALSE FALSE FALSE ...
##  - attr(*, "thresholds")= Named num [1:2] 11.7 252.9
##   ..- attr(*, "names")= chr [1:2] "lower" "higher"

Plot metric histogram with thresholds

plotMetricHist(spe, metric="Area_um", use_fences="Area_um_outlier_mc")

Second flagging

Quality score is defined as:

1 / (1 + exp(-1 * log2countbyarea + 1 * abs(log2AspectRatio) * as.numeric(spe$dist_border<50)))

spe <- computeQCScore(spe)
str(spe$quality_score)
##  num [1:59234] 0.0242 0.0883 0.1992 0.0368 0.3513 ...

First, let’s take a look at the different terms defining the quality score

plotQCscoreTerms(spe)

Now quality score weights are optimized in a data-driven fashion. First of all, fixed thresholds flagged cells for 0 counts and control counts on total counts ratio must be identified to label these cells as low quality examples for the training.

spe <- computeFixedFlags(spe)

Quality score optimization through Generalized Linear Model training

spe <- qscoreOpt(spe, plot = TRUE, custom = FALSE)
## Chosen low quality examples: 1685
## Joining with `by = join_by(fov, cellID, cell_id, Area, AspectRatio,
## CenterX_local_px, CenterY_local_px, Width, Height, Mean.PanCK, Max.PanCK,
## Mean.CD68, Max.CD68, Mean.MembraneStain_B2M, Max.MembraneStain_B2M, Mean.CD45,
## Max.CD45, Mean.DAPI, Max.DAPI, sample_id, sum, detected, subsets_NegPrb_sum,
## subsets_NegPrb_detected, subsets_NegPrb_percent, total, control_sum,
## control_detected, target_sum, target_detected, CenterX_global_px,
## CenterY_global_px, CenterX_global_um, CenterY_global_um, Area_um,
## log2AspectRatio, ctrl_total_ratio, log2CountArea, polygons.cell_id,
## polygons.is_multi, polygons.multi_n, polygons.fov, polygons.cellID,
## polygons.global, polygons.local, Mean.DAPI_outlier_mc, Mean.DAPI_outlier_sc,
## Area_um_outlier_mc, Area_um_outlier_sc, dist_border_x, dist_border_y,
## dist_border, quality_score, is_zero_counts, is_ctrl_tot_outlier,
## fixed_filter_out, qscore_train, rn)`
## Computing optimized quality score with coefficients: -0.77, 0.59, -2.62
## `geom_smooth()` using formula = 'y ~ x'

plotMetricHist(spe, metric="quality_score")

plotMetricHist(spe, metric="quality_score_opt")

spe <- computeQscoreFlags(spe, qs_threshold=0.4, opt=TRUE)

Polygon plotting

FirstFilterPlot(spe, fov = c(11:12), theme = "dark", custom = FALSE)
## Warning in get_plot_component(plot, "guide-box"): Multiple components found;
## returning the first one. To return all, use `return_all = TRUE`.

FirstFilterPlot(spe, fov = c(11:12), theme = "light", custom = FALSE)
## Warning in get_plot_component(plot, "guide-box"): Multiple components found;
## returning the first one. To return all, use `return_all = TRUE`.

Plotting flag score values on fov 11 and 12

spe1112 <- spe[,spe$fov %in% c(11:12)]
plotPolygonsSPE(spe1112, colour_by="quality_score", palette="viridis")
## Warning: The projection of the shape object pols is not known, while it seems to
## be projected.
## Warning: Current projection of shape pols unknown and cannot be determined.

spe1112 <- spe[,spe$fov %in% c(11:12)]
plotPolygonsSPE(spe1112, colour_by="quality_score_opt", palette="viridis")
## Warning: The projection of the shape object pols is not known, while it seems to
## be projected.
## Warning: Current projection of shape pols unknown and cannot be determined.

Zoom plot on global map with FOV grid

plotZoomFovsMap(spe, fovs=c(16:19), colour_by="quality_score")

Summary table

out_boundaries <- c(0, 0.1, 
               round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[1],2), 
               round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[2],2),
               round(attr(spe$Area_um_outlier_mc, "thresholds")[1],2), 
               round(attr(spe$Area_um_outlier_mc, "thresholds")[2],2),
               0.4)

flagged_cells <- c(sum(spe$is_zero_counts == TRUE, na.rm = TRUE),
                   sum(spe$is_ctrl_tot_outlier == TRUE, na.rm = TRUE),
                   sum(spe$Mean.DAPI < round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[1],2),na.rm = TRUE),
                   sum(spe$Mean.DAPI > round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[2],2),na.rm = TRUE),
                   sum(spe$Area_um < round(attr(spe$Area_um_outlier_mc, "thresholds")[1],2), na.rm = TRUE), 
                   sum(spe$Area_um > round(attr(spe$Area_um_outlier_mc, "thresholds")[2],2), na.rm = TRUE),
                   sum(spe$quality_score_opt < 0.4, na.rm = TRUE))

unflagged_cells <- c(sum(spe$is_zero_counts == FALSE, na.rm = TRUE),
                     sum(spe$is_ctrl_tot_outlier == FALSE, na.rm = TRUE),
                    sum(spe$Mean.DAPI > round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[1],2) & spe$Mean.DAPI < round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[2],2),na.rm = TRUE),
                     " ",
                     sum(spe$Area_um > round(attr(spe$Area_um_outlier_mc, "thresholds")[1],2) & spe$Area_um < round(attr(spe$Area_um_outlier_mc, "thresholds")[2],2), na.rm = TRUE),
                     " ",
                    sum(!spe$quality_score_opt < 0.4, na.rm = TRUE))

union_flagged <- length(unique(c(spe[,spe$is_zero_counts == TRUE]$cell_id,
                   spe[,spe$Mean.DAPI < round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[1],2) | spe$Mean.DAPI > round(attr(spe$Mean.DAPI_outlier_mc, "thresholds")[2],2)]$cell_id,
                   spe[,spe$Area_um < round(attr(spe$Area_um_outlier_mc, "thresholds")[1],2) | spe$Area_um > round(attr(spe$Area_um_outlier_mc, "thresholds")[2],2)]$cell_id,
                   spe[,spe$quality_score_opt < 0.4]$cell_id)))

percent_flagged <- round(flagged_cells/dim(spe)[2]*100, 2)

stat_df <- data.frame(out_boundaries = out_boundaries, unflagged_cells = unflagged_cells, flagged_cells = flagged_cells, percent_flagged = percent_flagged, row.names = c("Total counts", "Control/total counts ratio", "Mean DAPI signal < lower thr.", "Mean DAPI signal > higher thr.","Cell area in um < lower thr.", "Cell area in um > higher thr.", "Quality score"))

library(knitr)
kable(stat_df, format = "html", col.names = c("Outlier boundaries", "Unflagged cells", "Flagged cells", "% flagged cells"), align = "c", caption = paste0("<center>Flagged cells by different metrics, total cells in whole sample: ", dim(spe)[2], ", total unique flagged cells in whole sample: ", union_flagged, ", % unique flagged cells: ",
round(union_flagged/dim(spe)[2]*100, 2),"</center>"), padding = 1L)
Flagged cells by different metrics, total cells in whole sample: 59234, total unique flagged cells in whole sample: 10144, % unique flagged cells: 17.13
Outlier boundaries Unflagged cells Flagged cells % flagged cells
Total counts 0.00 59234 0 0.00
Control/total counts ratio 0.10 59212 22 0.04
Mean DAPI signal < lower thr. 74.03 58010 495 0.84
Mean DAPI signal > higher thr. 2841.19 729 1.23
Cell area in um < lower thr. 11.66 58776 58 0.10
Cell area in um > higher thr. 252.94 400 0.68
Quality score 0.40 49923 9311 15.72

Bonuses

Scatter plot to view correlation of variables + density plot to view distribution of single variables.

# you have to install this first package
library(ggpointdensity)
# these are already included in SpaceTrooper
library(ggpubr)
library(cowplot)

cell_df <- data.frame(colData(spe))
p1 <- ggplot(data = cell_df) +
  geom_pointdensity(aes(x = sum, y = Area_um)) +
  scale_color_viridis_c() +
  theme(legend.position = "none",
        plot.margin = margin(),
        panel.border = element_blank())

p2 <- ggdensity(data = cell_df, x = "Area_um", color = "#ccccff", 
                fill = "#ccccff", alpha = 0.5) + theme_classic() + 
  theme(legend.position = "none", axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.line.y = element_blank(),
    axis.line.x = element_blank(),
    plot.margin = margin()) + coord_flip()

p3 <- ggdensity(data = cell_df, x = "sum", color = "#ccccff", 
                fill = "#ccccff", alpha = 0.5) + theme_classic() + 
  theme(legend.position = "none", axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.line.x = element_blank(),
    axis.line.y = element_blank(),
    plot.margin = margin()) 

plot_grid(p3, NULL, p1, p2, ncol = 2, align = "hv",
          rel_widths = c(2,1), rel_heights = c(1, 2))